NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Mobile-3DCNN: An Acceleration Framework for Ultra-Real-Time Execution of Large 3D CNNs on Mobile Devices

https://doi.org/10.1145/3747842

Niu, Wei; Sun, Mengshu; Li, Zhengang; Chen, Jou-An; Guan, Jiexiong; Shen, Xipeng; Liu, Jun; Zhang, Mei; Wang, Yanzhi; Lin, Xue; et al (July 2025, ACM Transactions on Architecture and Code Optimization)

It is challenging to deploy 3D Convolutional Neural Networks (3D CNNs) on mobile devices, specifically if both real-time execution and high inference accuracy are in demand, because the increasingly large model size and complex model structure of 3D CNNs usually require tremendous computation and memory resources. Weight pruning is proposed to mitigate this challenge. However, existing pruning is either not compatible with modern parallel architectures, resulting in long inference latency or subject to significant accuracy degradation. This paper proposes an end-to-end 3D CNN acceleration framework based on pruning/compilation co-design called Mobile-3DCNN that consists of two parts: a novel, fine-grained structured pruning enhanced by a prune/Winograd adaptive selection (that is mobile-hardware-friendly and can achieve high pruning accuracy), and a set of compiler optimization and code generation techniques enabled by our pruning (to fully transform the pruning benefit to real performance gains). The evaluation demonstrates that Mobile-3DCNN outperforms state-of-the-art end-to-end DNN acceleration frameworks that support 3D CNN execution on mobile devices, Alibaba Mobile Neural Networks and Pytorch-Mobile with speedup up to 34 × with minor accuracy degradation, proving it is possible to execute high-accuracy large 3D CNNs on mobile devices in real-time (or even ultra-real-time).
more » « less
Full Text Available
Survey: Exploiting Data Redundancy for Optimization of Deep Learning

https://doi.org/10.1145/3564663

Chen, Jou-An; Niu, Wei; Ren, Bin; Wang, Yanzhi; Shen, Xipeng (October 2023, ACM Computing Surveys)

Data redundancy is ubiquitous in the inputs and intermediate results of Deep Neural Networks (DNN) . It offers many significant opportunities for improving DNN performance and efficiency and has been explored in a large body of work. These studies have scattered in many venues across several years. The targets they focus on range from images to videos and texts, and the techniques they use to detect and exploit data redundancy also vary in many aspects. There is not yet a systematic examination and summary of the many efforts, making it difficult for researchers to get a comprehensive view of the prior work, the state of the art, differences and shared principles, and the areas and directions yet to explore. This article tries to fill the void. It surveys hundreds of recent papers on the topic, introduces a novel taxonomy to put the various techniques into a single categorization framework, offers a comprehensive description of the main methods used for exploiting data redundancy in improving multiple kinds of DNNs on data, and points out a set of research opportunities for future exploration.
more » « less
Full Text Available
BitGNN: Unleashing the Performance Potential of Binary Graph Neural Networks on GPUs

https://doi.org/10.1145/3577193.3593725

Chen, Jou-An; Sung, Hsin-Hsuan; Shen, Xipeng; Choudhury, Sutanay; Li, Ang (June 2023, ACM)

Full Text Available
Decentralized Application-Level Adaptive Scheduling for Multi-Instance DNNs on Open Mobile Devices

Sung, Hsin-Hsuan; Chen, Jou-An; Niu, Wei; Guan, Jiexiong; Ren, Bin; Shen, Xipeng (January 2023, Proceedings of the 2023 USENIX Annual Technical Conference)

As more apps embrace AI, it is becoming increasingly common that multiple Deep Neural Networks (DNN)-powered apps may run at the same time on a mobile device. This paper explores scheduling in such multi-instance DNN scenarios, on general open mobile systems (e.g., common smartphones and tablets). Unlike closed systems (e.g., autonomous driving systems) where the set of co-run apps is known beforehand, the user of an open mobile system may install or uninstall arbitrary apps at any time, and a centralized solution is subject to adoption barriers. This work proposes the first-known decentralized application-level scheduling mechanism to address the problem. By leveraging the adaptivity of Deep Reinforcement Learning, the solution is shown to make the scheduling of co-run apps converge to a Nash equilibrium point, yielding a good balance of gains among the apps. The solution moreover automatically adapts to the running environment and the underlying OS and hardware. Experiments show that the solution consistently produces significant speedups and energy savings across DNN workloads, hardware configurations, and running scenarios.
more » « less
Full Text Available
Bit-GraphBLAS: Bit-Level Optimizations of Matrix-Centric Graph Processing on GPU

https://doi.org/10.1109/IPDPS53621.2022.00056

Chen, Jou-An; Sung, Hsin-Hsuan; Shen, Xipeng; Tallent, Nathan; Barker, Kevin; Li, Ang (May 2022, 36th IEEE International Parallel & Distributed Processing Symposium)

Full Text Available
RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices

Niu, Wei; Sun, Mengshu; Li, Zhengang; Chen, Jou-An; Guan, Jiexiong; Shen, Xipeng; Wang, Yanzhi; Liu, Sijia; Lin, Xue; Ren, Bin (May 2021, Proceedings of the AAAI Conference on Artificial Intelligence)
null (Ed.)
Full Text Available

Search for: All records